tickets/SP-3008: Bug fixes and improved tools for comparison of completed visits and pre-night sims by ehneilsen · Pull Request #196 · lsst/schedview

ehneilsen · 2026-03-12T20:09:30Z

This started off as a bug fix for comparenight being too picky about how it was matching simulated visits to visits in the prenight simulation, failing to match visits when the pointing was not in the same healpixel. This change replaces the simple healpix-based strategy with coordinate matching taking advantage of astropy's catalog matching tools.

While debugging, I also added several improved tools for showing tables and figures that compare simulated with actual visits. I think these greatly improve on what was in comparenight before. There will be a corresponding PR on schedview_notebooks (which will depend on this PR in schedview) shortly.

rhiannonlynne

I have comments, because some of the docstrings could be clearer and some were actually completely opaque as they are.

One thing in particular I had not taken into account was that you were putting together all of the different simulation visits, plus all of the completed visits, into one big data frame. This is not easy to tell from the information that is available in this module.

I suspect that it may be starting to make sense to make a class to handle "all of the data acquisition from a night", including fetching the given set of simulation visits, the completed visits, and then concatenating them in the way that currently only exists in the notebook. You later end up depending on having the data in this particular format, but it's not an obvious format and not created anywhere within this code base -- which to me, means that maybe this should be shifted to a class, with methods to fetch and manipulate the data, so that you can maintain the particular data format you want.

rhiannonlynne · 2026-03-24T22:06:37Z

schedview/compute/comparesim.py

+from .multisim import match_visits_across_sims
+
+
+def assign_field_hpids(


This probably needs a different name, to reduce confusion about what it does. "assign_field_hpids" doesn't imply matching, but one of the requirements for this function is two different data frames to match (and this is what drives the hpid assignment, if I'm getting it right).

Perhaps it could simply be "match_pointings_hpid" ?

Or maybe "assign_matching_field_hpids"?
I don't know .. just assign_field_hpids is accurate but makes me think you could have done what you probably did first (just assign each pointing to the nearest hpid).

How about I modify this to just assign each pointing to the nearest hpid, and then have a separate function, snap_to_referece_hpid_when_close, that changes assigned hpids to one those from a reference set when the coordinates are close, and then have the notebook call both in succession?

By the time you get rid of the matching, there's really nothing left but the call to hp.ang2pix, so maybe we should just drop the assign_field_hpids function entierly and have the notebook call hp.ang2pix directly, followed by snap_to_reference_hpid_when_close.

I think you're maybe just (at the end of the day) proposing the new name here then?

would then "assign_matching_reference_hpids" or something work?

Mostly, with the difference that the initial setting of the id is by a direct call to hp.ang2pix in the notebook, and a separate call to snap to the reference. This does result in more being done in the notebook, but has the advantage that in the notebook it makes it clear what's going on. I'm happy either with your proposed renaming, or with pulling hp.ang2pix out.

rhiannonlynne · 2026-03-24T22:07:23Z

schedview/compute/comparesim.py

+    simulated_visits : `pd.DataFrame`
+        Simulated visit table; must contain columns identified by ``ra_col``
+        and ``dec_col`` (both in degrees).  An integer column named
+        ``hpid_col`` will be added or overwritten.


These are the visit pointings that are being used as the basis for the match, correct? Might be useful to make that comment here.

And I see (from the notebook) that it's actually the simulated visits from all of the simulations, not a single sim.
Are there failure modes in matching, if different simulations had slightly different field centres?

If different simulations have different field centers, they will match if they have they are close enough to correspond to the same healpix, and not otherwise.
It's not clear to me whether this is actually a failure, though.
What we're trying to do here is to match hpids when a completed visit corresponds to the visit in a simulation.
If different simulations have different field centers, then the completed visit could have come from a scheduler request that matches at most one of them, and the simulation that it is nearest to seems like the best guess (but not guaranteed).

An alternative would be to ignore dithering altogether: instead of matching to field centers in simulation, do the catalog match to the original undithered WFD tessellation, and match both coordinates in completed and all the simulations to these points. This would mean, however, that pointings could be a large fraction of a camera diameter off and still be considered a "match."

rhiannonlynne · 2026-03-24T22:08:56Z

schedview/compute/comparesim.py

+        and ``dec_col`` (both in degrees).  An integer column named
+        ``hpid_col`` will be added or overwritten.
+
+    completed_visits : `pd.DataFrame`


Will the completed visits hpid be overwritten, when compared against a new simulation?

If it is set already (and inplace=True), yes. I'll note that in the docstring.

rhiannonlynne · 2026-03-24T22:16:43Z

schedview/compute/comparesim.py

+        Completed visit table; same column requirements as
+        ``simulated_visits``.
+
+    field_hpix_nside : `int`


Is there a reasonable useful default for this?
I imagine that your testing has shown that some values tend to work pretty well, while others are overkill and others are way too coarse. I'm guessing that either 32 or at the outside 64 are best choices?

I've been using 2*18=262144, or about 0.8" resolution, but highter would be just as good. My goal here was to make sure that it is high enough that simulated pointings that are different should have different hpids, and I wasn't concerned with it being too high because matching sims that match each other should have exactly matching coordinates.

rhiannonlynne · 2026-03-24T22:21:22Z

schedview/compute/comparesim.py

+
+    field_coord_tolerance_deg : `float`
+        Maximum angular separation (deg) for a completed visit to be considered
+        a match to a simulated healpix center.


Likewise, is there a recommendation for this? I imagine it's on the order of the small offsets you've been noticing, which ought to hold up as long as the tessellation we're using matches the tessellation on the summit, for that night.
I imagine that the recommendation is something like
"On a typical night, where the field tessellation on the summit matches the tessellation in the simulation (for wide-area surveys), the tolerance is likely around 0.001 (??) degrees. For dithering around field centres and for mismatches in tessellation, the tolerance might be set larger, potentially up to 1.7 degrees." ( ??)

rhiannonlynne · 2026-03-24T22:35:31Z

schedview/compute/comparesim.py

+        within_tol = sim_hp_sep.deg < coord_match_tolerance_deg
+        completed_visits.loc[within_tol, hpid_col] = sim_hpids[sim_hp_match_idx[within_tol]]
+
+    return simulated_visits, completed_visits


I was wondering why we couldn't pass in multiple simulated_visits data frames at the same time, but I think it's because you're matching the completed visits to those simulated visits, and then the completed_visits can end up with different healpix ids depending on which simulation it's been matched against.

As I think you noticed elsewhere, I pass visits from all simulations in the same DataFrame.

rhiannonlynne · 2026-03-25T06:49:29Z

schedview/compute/comparesim.py

+        The simulation index to compare against the observation (index 0).
+    visits : `pd.DataFrame`
+        Table of visits that contains at least the columns ``sim_index`` and
+        ``start_timestamp``.


I don't understand from this docstring what "visits" is .. it seems like it's not just the information from either the completed visits or a single simulation, but potentially both? How are they combined?
Is there some function somewhere you'd use to create the "visits" data frame and could that be referenced here?

Yes, visits is the DataFrame with all visits, both from completed and simulated simulations.

rhiannonlynne · 2026-03-25T06:50:38Z

schedview/compute/comparesim.py

+    Parameters
+    ----------
+    sim_index : `int`
+        The simulation index to compare against the observation (index 0).


What is sim_index? Is it part of how the "visits" data frame referenced next was created?

It's an id that indentifies sequences of visits. That is, the visits DataFrame contains all visits, including the simulations and completed visits. The sim_index column of the visits dataframe identifies which sim (or completed) that specific visit came from.

rhiannonlynne · 2026-03-25T07:07:53Z

schedview/compute/comparesim.py

+            dec=completed_visits[decl_col].values * u.deg,
+            frame="icrs",
+        )
+        sim_hp_match_idx, sim_hp_sep, _ = completed_coords.match_to_catalog_sky(sim_hp_coords)


Given that you're using a really high side by default for assigning the sim visits to healpix ids -- could there be a potential failure where:
sim 1 has field centres which are slightly different than the field centres in sim 2
then the completed visits would only match either the healpix in sim 1 or sim 2, but if you'd used a lower nside they would both have actually matched and both been within the tolerance?

See my comment on the corresponding comment on the docstring. My intention was to distinguish dithers, so completed visits in one dither not match simulated visits in a different dither, but rather only accommodate differences between the coordinates requested by the FBS and the coordinates reported in ConsDB.

rhiannonlynne · 2026-03-25T07:12:30Z

To be honest, I'm still a little confused about the matching, after seeing how you're actually fetching and using the data frame.
With current on-sky visits from the FBS, we don't re-tesselate within the night, but that isn't guaranteed to be true in general (if, for example, we had same-filter pairs, we might want to dither the tessellation within the night to avoid chip gap overlaps), and while we might make that re-tesselation repeatable, I could imagine it being something that is also random or depends on the time of night, etc. So that different simulations might end up with field centres that are offset from each other, which would (at a high side) result in them having different healpix IDs.
But then the real visits that are "close enough" to the centre of the field pointing .. would they match up with only one of the sims (the closest) or would they match all of them (assuming in all cases that the offset in the field centre is less than the distance tolerance defined)?

ehneilsen · 2026-03-25T17:16:40Z

I suspect that it may be starting to make sense to make a class to handle "all of the data acquisition from a night", including fetching the given set of simulation visits, the completed visits, and then concatenating them in the way that currently only exists in the notebook. You later end up depending on having the data in this particular format, but it's not an obvious format and not created anywhere within this code base -- which to me, means that maybe this should be shifted to a class, with methods to fetch and manipulate the data, so that you can maintain the particular data format you want.

I dislike the idea of putting this into a class because it interferes with code reuse. Most of what I want to do with the visits table involves pandas.DataFrame operations or passing visits as an argument to functions that take a pandas.DataFrame (e.g. bokeh functions).

Instead, I propose defining and documenting a python NewType:

from typing import NewType

VisitsDF = NewType('VisitsDF', pd.DataFrame)
# Document columns and dtypes here

This will make VisitsDF look like a subclass of pd.DataFrame from the point of view of a type checker, so if you pass a plain pd.DataFrame to a functions with a type hint of VisitsDF the type checker will complain, but it will be fine if you pass a VisitsDF as argument v to a function with a type hint of pd.DataFrame for argument v.

From the point of view of the runtime python interpreter, it VisitsDF will just be a synonym for pd.DataFrame.

So, using NewType will document what is going on and help pre-commit checks catch issues, but won't actually interfere with using it (unlike creating a class).

Alternately, I am aware of pandera, a module for enforcing specific schema for data frames. I have never tried it, but would be open to it.

ehneilsen · 2026-03-25T17:30:56Z

I will move the construction of the DataFrames from the notebook into functions in schedview that return instances of VisitsDF (or whatever I call it): I agree that moving as much code as possible out of the notebooks and into schedview proper where it can be type checked, unit tested, etc. is a good idea. I just don't want the object I get back from these functions to be hard to deal with with standard tools the way a custom class would be.

rhiannonlynne · 2026-03-25T18:35:47Z

I don't really like the creation of the NewType, because if you really want it to be a data frame for use in other places, then it should just be a data frame .. making it a specific type seems likely to lead to (at some point) relying on a specific property (either data value or behaviour) of the NewType .. and then it's not just a data frame anymore anyway.

Ok - leave it as a pandas data frame, but indicate where the simulated_visits ought to be coming from (from querying for all of the simulated visits in the night) and then .. does it make sense to just combine the simulated and completed visits when returning from the assign_hpid_field_centers function?
Or otherwise have another function which does the concat in the way you want it to (with the addition of the extra metadata), indicate that for the computing of offsets, you should use the output of that new function?

ehneilsen · 2026-03-25T19:10:31Z

I don't really like the creation of the NewType, because if you really want it to be a data frame for use in other places, then it should just be a data frame .. making it a specific type seems likely to lead to (at some point) relying on a specific property (either data value or behaviour) of the NewType .. and then it's not just a data frame anymore anyway.

The reason why I proposed using NewType instead of creating a subclass was to reduce the temptation to add or change actual functionality in the subclass. Another option would be to use a TypeAlias, in which case the python interpreter wouldn't even be able to tell the difference:

>>> from typing import TypeAlias
... 
... class Foo():
...     pass
... 
... Bar: TypeAlias = Foo
... 
... bar = Bar()
... foo = Foo()
... 
... foo.__class__ == bar.__class__, isinstance(foo, Bar), isinstance(bar, Foo)
... 
(True, True, True)
>>> foo.__class__.__name__
'Foo'
>>> bar.__class__.__name__
'Foo'

Or, we can leave it as a DataFrame. I'm happy with any of these.

So yes, the best answer may be just for me to add better docstrings to the functions that take the DataFrame as arguments describing exactly what columns of what types it uses, and what they should mean, and referencing functions that can be used to produce conformant DataFrames.

I am in the process of writing such functions now (moving more code from the notebooks to schedview).

rhiannonlynne · 2026-03-25T19:42:37Z

I moved some of the discussion about matching to slack here since maybe this is a question that would benefit from other viewpoints.
I can see your point, but I also think that I would have said any visit that came some relatively large fraction of the FOV to our requested visit was actually a "match" to a given simulation.
Also, I verified that the offsets between requested ra/dec and acquired ra/dec range up to 4" or even 4.5" for some nights ... maybe a "safe" value would be a minimum of 5" for a precise target/boresight matchup.

ehneilsen requested a review from rhiannonlynne March 12, 2026 20:09

ehneilsen mentioned this pull request Mar 12, 2026

tickets/SP-3008: Rework timing offset tables and plots in comparenight lsst/schedview_notebooks#147

Open

ehneilsen force-pushed the tickets/SP-3008 branch 2 times, most recently from 0d5e4b2 to 7c25abc Compare March 17, 2026 16:18

ehneilsen and others added 16 commits March 19, 2026 09:28

Utilities for comparing observed to simulated visits

a60f40a

Add compute_common_fraction

bb94e96

fade sims in visit_vs_time plot

0934cb4

add plot_obs_vs_sim_time

fba1f38

tests for assign_field_hpids

9caadc9

tests for offsets_of_coord_band

1e188b6

Improve rng handling

b508741

add tests for compute_obs_sim_offsets

9eea967

add asserts to offsets_of_coord_band

47e9a47

Add tests for compute_offset_stats

680f212

Fix future warning

d3d0b29

quiet performance warning

37a1937

Add docstring to compute_common_fractions

bba2d34

Add tests for compute_common_fractions

aed3780

Add plot_obs_vs_time test

8c6de5e

isort

85c963f

ehneilsen force-pushed the tickets/SP-3008 branch from 7c25abc to 85c963f Compare March 19, 2026 16:28

ehneilsen added 2 commits March 19, 2026 14:20

remove compute_common_fraction

24b021f

Add default_sim option to plot_visit_param_vs_time

486583f

rhiannonlynne approved these changes Mar 25, 2026

View reviewed changes

Minor updates for PR comments

ba6c1eb

		from .multisim import match_visits_across_sims


		def assign_field_hpids(

Conversation

ehneilsen commented Mar 12, 2026

Uh oh!

rhiannonlynne left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehneilsen Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhiannonlynne commented Mar 25, 2026

Uh oh!

ehneilsen commented Mar 25, 2026

Uh oh!

ehneilsen commented Mar 25, 2026

Uh oh!

rhiannonlynne commented Mar 25, 2026

Uh oh!

ehneilsen commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhiannonlynne commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehneilsen Mar 25, 2026 •

edited

Loading

ehneilsen commented Mar 25, 2026 •

edited

Loading